Title: Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms (extended Revised Version of Tr-c131)

نویسندگان

Carlos Domingo

Ricard Gavalda

Osamu Watanabe

چکیده

Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is diicult to use due to the diiculty of determining an appropriate sample size. In this paper, we take a sequential sampling approach for solving this diiculty, and propose an adaptive sampling method that solves a general problem covering many actual problems arising in applications of discovery science. An algorithm following this method obtains examples sequentially in an on-line fashion, and it determines from the obtained examples whether it has already seen a large enough number of examples. Thus, sample size is not xed a priori; instead, it adaptively depends on the situation. Due to this adaptiveness, if we are not in a worst case situation as fortunately happens in many practical applications, then we can solve the problem with a number of examples much smaller than the required in the worst case. We prove the correctness of our method and estimates its eeciency theoretically. For illustrating its usefulness, we consider one concrete example of using sampling, provide an algorithm based on our method, and show its eeciency by experimental evaluation. (This an extended revised version of TR-C131.)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Reports on Mathematical and Computing Sciences: Tr-c136 Title: Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms (extended Revised Version of Tr-c131)

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

A New Adaptive Sampling Method for Scalable Learning

Scaling up data mining algorithms to handle huge data sets is an important issue in machine learning and knowledge discovery. Random sampling is often used to achieve better scalability in learning from massive amount of data. Adaptive sampling offers advantages over traditional batch sampling methods in that adaptive sampling often uses much lower number of samples and thus better efficiency w...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

متن کامل

Adaptive Rule-Base Influence Function Mechanism for Cultural Algorithm

This study proposes a modified version of cultural algorithms (CAs) which benefits from rule-based system for influence function. This rule-based system selects and applies the suitable knowledge source according to the distribution of the solutions. This is important to use appropriate influence function to apply to a specific individual, regarding to its role in the search process. This rule ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Title: Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms (extended Revised Version of Tr-c131)

نویسندگان

چکیده

منابع مشابه

Technical Reports on Mathematical and Computing Sciences: Tr-c136 Title: Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms (extended Revised Version of Tr-c131)

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

A New Adaptive Sampling Method for Scalable Learning

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

Adaptive Rule-Base Influence Function Mechanism for Cultural Algorithm

عنوان ژورنال:

اشتراک گذاری